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Abstract — Exponential error bounds for the finite-alphabet 
interference channel (IFC) with two transmitter-receiver pairs, 
are investigated under the random coding regime. Our focus is 
on optimum decoding, as opposed to heuristic decoding rules that 
have been used in previous works, like joint typicality decoding, 
decoding based on interference cancellation, and decoding that 
considers the interference as additional noise. Indeed, the fact 
that the actual interfering signal is a codeword and not an 
i.i.d. noise process complicates the application of conventional 
techniques to the performance analysis of the optimum decoder. 
Using analytical tools rooted in statistical physics, we derive a 
single letter expression for error exponents achievable under 
optimum decoding and demonstrate strict improvement over 
error exponents obtainable using suboptimal decoding rules, but 
which are amenable to more conventional analysis. 

Index Terms — Error exponent region, large deviations, method 
of types, statistical physics. 



I. Introduction 

The M-user interference channel (IFC) models the commu- 
nication between M transmitter-receiver pairs, wherein each 
receiver must decode its corresponding transmitter's message 
from a signal that is corrupted by interference from the other 
transmitters, in addition to channel noise. The information 
theoretic analysis of the IFC was initiated over 30 year ago 
and has recently witnessed a resurgence of interest, motivated 
by new potential applications, such as wireless communication 
over unregulated spectrum. 

Previous work on the IFC has focused on obtaining inner 
and outer bounds to the capacity region for memoryless 
interference and noise, with a precise characterization of the 
capacity region remaining elusive for most channels, even for 
M = 2 users. The best known inner bound for the IFC is the 
Han-Kobayashi (HK) region, established in (T). It has been 
found to be tight in certain special cases ([1], [2]), and recently 
was found to be tight to within 1 bit for the two user Gaussian 
IFC [3 1. No achievable rates that lie outside the HK region are 
known for any IFC with M = 2 users. 

Our aim in this paper is to extend the study of achievable 
schemes to the analysis of error exponents, or exponential 
rates of decay of error probabilities, that are attainable as a 
function of user rates. To our knowledge, there has been no 
prior treatment of error exponents for the IFC. In particular, 
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the error bounds underlying the achievability results in HI 
yield vanishing error exponents (though still decaying error 
probability) at all rates. 

The notion of an error exponent region, or a set of achiev- 
able exponential rates of decay in the error probabilities for 
different users at a given operating rate-tuple in a multi-user 
communication network, was formalized recently in [4], and 
studied therein for Gaussian multiple access and broadcast 
channels. Our main result, presented in Section HVl is a single 
letter characterization of an achievable error exponent region, 
as a function of user rates, for the M = 2 user finite alphabet, 
memoryless interference channel. The region is derived by 
bounding the average error probability of random codebooks 
comprised of i.i.d. codewords uniformly distributed over a type 
class, under maximum likelihood (ML) decoding at each user. 
Unlike the single user setting, in this case, the effective channel 
determining each receiver's ML decoding rule is induced both 
by the noise and the interfering user's codebook. Our focus 
on optimal decoding is a departure from the conventional 
achievability arguments in [ 1| and elsewhere, which are based 
on joint-typicality decoding, with restrictions on the decoder 
to "neat interference as noise" or to "decode the interference" 
in part or in whole. However, in this work, we confine our 
analysis to codebook ensembles that are simpler than the 
superposition codebooks of JT|. 

The analysis of the probability of decoding error under 
optimal decoding is complicated due to correlations induced 
by the interfering signal. Usual methods for bounding the 
probability of error based on Jensen's inequality and other 
related inequalities (see, e.g., I© below) fail to give good 
results. Our bounding approach combines some of the clas- 
sical information theoretic approaches of Q and (6) with an 
analytical technique from statistical physics that was applied 
recently to the study of single user channels in 0, 0D- More 
specifically, as in J5], we use auxiliary parameters p and A 
to get an upper bound on the average probability of decoding 
error under ML decoding, which we then bound using the 
method of types (6). Key in our derivation is the use of 
distance enumerators in the spirit of Q and O, which allows 
us to avoid using Jensen's inequality in some steps, and allows 
us to maintain exponential tightness in other inequalities by 
applying them to only polynomially few terms (as opposed to 
exponentially many) in certain sums that bound the probability 
of decoding error. It should be emphasized, in this context, that 
the use of this technique was pivotal to our results. Our earlier 
attempts, that were based on more 'traditional' tools, failed to 
provide meaningful results. In fact, they all turned out to be 



2 



inferior to some trivial bounds. 

The paper is organized as follows. The notation, various 
definitions, and the channel model assumed throughout the 
paper are detailed in Section [TT] In Section [TTTJ we derive an 
"easy" set of attainable error exponents which we shall treat as 
a benchmark for the exponents of the main section, Section [IV] 
The "easy" exponents are obtained by simple extensions to 
the interference channel of existing error exponent results for 
single user and multiple access channels, based on random 
constant composition codebooks and suboptimal decoders. 
Then, in Section IIVI we derive another set of attainable 
exponents by analyzing ML decoding for the channel induced 
by the interfering codebook. In Section [V] we show that the 
minimizations required to evaluate the new error exponents can 
be written as convex optimization problems, and, as a result, 
can be solved efficiently. We follow this up in Section [VT1 with 
a numerical comparison of the new exponents with the baseline 
exponents of Section [III] for a simple IFC. These numerical 
results demonstrate that the new exponents are never worse 
(at least for the chosen channel and parameters) and, for most 
rates, strictly improve over the baseline exponents. 

An earlier version of this work was presented in [9]. 

II. Notation, Definitions, and Channel Model 

Unless otherwise stated, we use lowercase and uppercase 
letters for scalars, boldface lowercase letters for vectors, 
uppercase (boldface) letters for random variables (vectors), 
and calligraphic letters for sets. For example, a is a scalar, 
v is a vector, X is a random variable, X is a random 
vector, and S is a set. For a real number a we shall, 
on occasion, let a denote 1 — a. Also, we use log(-) to 
denote natural logarithm, E to denote expectation, and Pr 
to denote probability. For independent random variables X 
and Y distributed according to P Xt y(x,y) = Px(x)Py(y), 
(x, y) £ X x y, we denote the conditional expectation operator 

E x (-) as E x (f(X,Y)) = £ xe ^ f(x, Y)P x (x) for any 
function /(•,•). All information quantities (entropy, mutual 
information, etc.) and rates are in nats. Finally, we use =, 
<, etc., to denote equality or inequality to the first order 
in the exponent, i.e. a„ = b„ linin^oo i log = 0; 

a n < b n limsup n ^ oc £ log < 0. 

The empirical probability mass function of the finite al- 
phabet sequence v = (v(l), . . . ,v(n)) with alphabet V is 
denoted as the vector {Pv(v), v £ V}, where each Pv(v) 
is the relative frequency of v(i) — v along v. The type 
class associated with an empirical probability mass function 
P, which will be denoted by 7p, is the set of all n-vectors 
{v} whose empirical probability mass function is equal to P. 
Similar conventions will apply to pairs and triples of vectors 
of length n, which are defined over the corresponding prod- 
uct alphabets. Information measures pertaining to empirical 
distributions will be denoted using the standard notational 
conventions, except that we use " " " as well as subscripts that 
indicate the sequences from which these empirical distribu- 
tions were extracted. For example, we write Hxyz(X, Y\Z) 
and Ixyz(X, Y; Z) to denote the conditional entropy of 



(X, Y) given Z and the mutual information between (X, Y) 
and Z, respectively, computed with respect to the empirical 
distribution Pxyz(x,y, z). We denote the relative entropy 
or Kullback-Leibler divergence between distributions Px and 

P Y as D(P X \\P Y ) = Y, x Px{x)\og{P x {x)/P Y {x)\ and 
we write D(P x \ z \\P Y \z\Pz) for the conditional relative 
entropy between conditional distributions P x \z and Py\z 

conditioned on P%, which is defined as D{P x \z\\Py\z\Pz) = 
Y, x , z Pz{z)P x \z{x\z)\og{P x \ z {x\z)/P Y \ z {x\z)) . 

We continue with a formal description of the two-user 
IFC setting. Let x { = (xi(l), . . . , Xi(ri)) £ X-\ i = 1,2, 
denote the channel input signals of the two transmitters, and 
let y i = . . . ,yi{n)) £ yf be the corresponding 

channel outputs received by decoders 1 and 2, where Xi 
and denote the input and output alphabets, and which 
we assume to be finite. Each (random) output symbol pair 
(Yi(j), Y2O)) is assumed to be conditionally independent 
of all other outputs, and all input symbols, given the two 
corresponding (random) input symbols (Xi(j), X%{j)), and 
the corresponding conditional probability is assumed to be 
constant from symbol to symbol. An (n, i?i,i?2) code for 
the IFC consists of pairs of encoding and decoding functions, 
(A, h) and (51, g 2 ), respectively, where f { : {1, . . . , M t } -> 
Xl\ Mi = \e nB *\ and 9i : y? -> {l,...,AfJ, * = 1,2. 
The performance of the code is characterized by a pair of 
error probabilities P e j = Pi(Wi ^ W{), i = 1,2, where 
Wi — gi(Yi) and Yi is the random output when user i 
transmits Xi = fi(Wi), assuming the messages Wi are 
uniformly distributed on the sets of indices {1, 2, . . . , Mj}, 
i = 1,2. The per user error probabilities depend on the 
channel only through the marginal conditional distributions 
of the channel outputs given the corresponding channel in- 
put pairs. We shall denote these conditional distributions as 
q i (y\x 1 ,x 2 )=Vr(Y i (j)=y\(X 1 (j),X 2 (j)) - {x u x 2 )). 

A pair of error exponents [E\,E2) is attainable at a rate 
pair {Ri,R 2 ) if there is a sequence of (n, codes 
satisfying Ei < liminfn^oo — (1/n) \ogP Ey i for £ = 1,2. The 
set of all attainable error exponents at (i?i, R 2 ) comprises the 
error exponent region at (i?i,i?2) and we shall denote it as 
£(i?i,7?2). The main result of this paper is a single letter 
characterization of a non-trivial subset of £(R\, R 2 ) for each 

III. Background 

In this section, we present achievable error exponents for 
the interference channel which are based on known results of 
error exponents for single user and multiple access channels 
(MAC) for fixed composition codebooks £12), £13], ifTTl . 
These exponents will be used as a baseline for comparing the 
performance of the error exponents that we derive in Section 
ED 

In the following, we will focus on the error performance of 
user 1, and as a result, all explanations and expressions will 
be specialized to receiver 1 . Similar expressions also hold for 
user 2 with the exchange of indices 1^2. 



3 



A possibly suboptimal decoder for the interference channel 
can be obtained from a given multiple access channel decoder 
by simply ignoring the decoded message of the interfering 
transmitter. For example, following lfl3l . we can use a mini- 
mum entropy decoder that for a given received vector i/j at 
receiver 1 computes (xx,x 2 ) 

(x 1 ,x 2 )= argmin H Ai±2 y(X 1 ,X 2 \Y 1 ), 
(x u x 2 )ec 1 xc 2 

and throws away x 2 . 

It follows from [13] that for random codebooks of fixed 
composition Qi,Q 2 , the average probability of decoding both 
messages in error, where the averaging is done over the 
random choice of codebooks, satisfies: 

Pr(xi ^x u x 2 ^ x 2 ) < e - nEl ' 2 

where 



Ex.2 — 



P X 1 X 2 Y 1 :P X! — Q 1 > P X 2 ~ 



Qi\ p x, .xJ 



Yi H yi l Xi,X 2 



+ I(X 1] X 2 ) 

+ \I(X i; Yi) + I(X 2 ; X 1} Y x ) -R 1 - R 2 \ + 
with I • |+ = max{-, 0}. 

In addition, the average probability of decoding the message 
of the interfering transmitter correctly but the message of the 
desired transmitter incorrectly satisfies: 

Pr(Ai ^ xi,x 2 = x 2 ) < e ~ nE ^ 

where 



^112 — 



mm 



Jr X 1 X 2 Y 1 ■ r X 1 —VI ' r x 2 — V2 

+ I{X l -X 2 ) + \I(k x ;X 2 ,Y x ) - + . 



Therefore, the overall average error performance of this MAC 
decoder in the IFC satisfies: 

Pr(£i ^ X X ) < e -nnJn{Ki. a ,B 1 | a } < 

A second suboptimal decoder that leads to tractable error 
performance bounds is the single user maximum mutual 
information decoder (which in this case coincides with the 
minimum entropy decoder): 

Xx = argmax/a^y (Xl; Yi). 

XiEd 

In this case, standard application of the method of types ifTTl 
leads to the following bound on the average error probability 
under random fixed composition codebooks of types Qi,Q 2 : 

Pr(*i ±x x ) <e- nE \ 

where 



£ i= P .p min 0lP -qV^y^xMIPx^xJ 

XiX 2 Yx _ VI - r x 2 — V2 

+ I(Xx;X 2 ) + \I(Xx;Yx)-Rx\ + . 



We can choose the better decoder between these two, that 
leads to the better error performance. Therefore, we obtain 
that 

E B .x = max{£'i; min{i?i^; -B112}} (1) 

is an achievable error exponent at receiver 1, with an analogous 
exponent following for receiver 2. 



IV. Main Result 

Our main contribution is stated in the following theorem, 
which presents a new error exponent region for the discrete 
memoryless two-user IFC. While the full proof appears in 
Appendix [A] we also provide a proof outline below, to give 
an idea of the main steps. 

Theorem 1: For a discrete memoryless two-user IFC as 
defined in Section H] for a family of block codes of rates Rx 
and R 2 a decoding error probability for user 1 satisfying 

liminf logP e ,i(n) > E Ril (Rx, R 2 , Qx, Q2, P, A) (2) 

M-+00 n 

can be achieved as the block length of the codes n goes to 
infinity, where the error exponent E^x{Rx, R2, Qx, Q2, P, A) 
is given by 

E R . 1 (R 1 ,R 2l Q ll Q 2l p,X) = |i? 2 - pRx +min| 

in > A \P> A > P x x x& » P x[x 2 Y( 



X 1 X 2 Y 1 i' X[X^Y{ 

G5i(Qi,Q 2 ) 



(p, , , p 

^X 1 X 2 Y 1 '^X 

GS 2 (Qi,Q 2 ,R 2 ) 



i ^ ) /2(p,A ) P^ A ,P^^)|| (3) 



where 

fx =g(p, A, P tl x 2 Yv P x^Y { ) - H(Yx\Xx) + pI{X[-Y() 
+ max\l(X 2 -X 1 ,Y 1 ) - R 2 - 



pX(I(X 2 ;Xx,Yx)-R 2 ) 
+ max ^pI{X' 2 ;Y{) + pl(X' 2 , X[,Y[) - R 2 : 
p(I(X^,Xi,Y{) -R 2 );p\{I{X' 2 ;X[,Y{) - R 2 ) \ (4) 



A 



h =g{p,\P XlX2 y l ,P x[x , n )-H(Yx\Xx) 
+ P I(X[;X 2 , Y{) +I(X 2 ;Xx,Yx)-R 2 



(5) 



with 



A 



g = - pXE^ j-^ \ogqx(Yx\Xx,X 2 ) 

-pXE^yAogqxiYdXiX^) 



and 



5i(Qi,Q 2 ) -{(-Px 1 x 2 Yi»- p x(x^Y 1 ') e 52 : p y ± - P Y{i 

p x 1 = p x>=Q^ p x 2 = p x>=Q*} (6) 



S 2 {Qx,Q2,R2) -{( p XiX 2 y 1 > p x' 1 x 2 y{) e 52 : 

P Xl = P X{ = Ql' P X 2 = P Xf 2 = Q2, 

R2<l(X 2 ,Y 1 ),P X2 y i =P X , y,} (7) 

where S is the probability simplex in X\ X X 2 X In the 
bound ©, (p, A) 6 [0, l] 2 can be chosen to maximize the error 
exponent Er.x- 

In eqs. ©, (fj), ©, and (0, Qx and Q 2 are probability dis- 
tributions defined over the alphabets Xx and X 2 respectively. 
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Expressions for the error probability P e 2 and error exponent 
Er.2 equivalent to (O and (01 can be stated for the receiver of 
user 2 by replacing X\ <-> X 2 , Y\ — > ^2, and qi — > (72 in all 
the expressions. By varying Q\ and Q2 over all probability 
distributions in X\ and X 2 respectively, we obtain the error 
exponent region for fixed rates R\ and R 2 . 

Remark 1: A lower bound to E* R1 = max Pj A -E_R.i(i?i, 
R 2) Qii Q2, P, A) is derived in Appendix lBl(cf. equation ( |B.4t ) 
that is closer in form to the expressions underlying the bench- 
mark exponent Eb.i presented above. In particular, this lower 
bound allows us to establish analytically (see Appendix [Bj 
that Eb,x < E* R t at Rx = (and for sufficiently small R{). 
Numerical computations, as presented in Section [VlJ indicate 
that this inequality can be strict. 



frequently used: 



A second application of the lower bound ( 1B.41 i is to deter- 
mine the set of rate pairs Rx,R 2 for which E R l > 0. We 
show in Appendix [B] that this region includes 

Tlx = {Rx < liX^Yi)} U {{Rx + R 2 < I(Yr,Xx,X 2 )} 

n{Rx < I{Xx;Yx\X 2 )}, 

with an analogous region following for the set where E R 2 > 
(see Fig. [T). 




I(X 2 ;YJ 



I(X i; YJ I(X l ;Y i \X 2 ) R t 
Fig. 1. Rate region 7£i where E* R 1 > 0. 

Furthermore, it is shown in [llj that the error exponent 
achievable for user no. 1 with optimal decoding and random 
fixed composition codebooks is zero outside the closure of 
the region TZx- This result, together with our contribution 
characterize the rate region where the attainable exponents 
with random constant composition codebooks are positive. 
Finally, it can be shown that this region is contained in the 
HK region ifTTIl . 

Remark 2: Theorem Q] presents an asymptotic upper bound 
on the average probability of decoding error for fixed compo- 
sition codebooks, where the averaging is done over the random 
choice of codebooks. It is straightforward to show (see, e.g., 
[4]) that there exists a specific (i.e. non-random) sequence of 
fixed composition codebooks of increasing block length n for 
which the same asymptotic error performance can be achieved. 

Proof Outline. For n non-negative reals ax,. . ■ ,a n and b 6 
[0,1], the following inequality Problem 4.15(f)] will be 



i=i 



(8) 



For a given block length n, we generate the codebook of 
user i — 1,2 by choosing Mj sequences Xi of length n 
independently and uniformly over all the sequences of length 
n and type Qi in X™, Note that Qi,i = 1,2 have rational 
entries with denominator n. We will write x^j to denote the 
j-th codeword of user i. 

For a given channel output y 1 6 yf, the best decod- 
ing rule to minimize the probability of error in decoding 
the message of user 1 is ML decoding, which consists of 
picking the message m which maximizes P(y 1 \xx <m ) = 
YaA Q ( x n \yx\ x i,m, x 2 ,i)/M 2 . Letting 



Mo 



^2q[ n) (y 1 \x 1 ,x 2 , l ) 



(9) 



A J_ 

~ M 2 

2—1 

be the "average" channel observed at receiver 1, where the 
averaging is done over the codewords of user 2 in C 2 , 
the decoding error probability at receiver 1 for transmitted 
codeword xx, m and codebooks Cx and C 2 is given by: 

Pe,x(Xl,m,Cx,C 2 ) = 



P ei i(x 1 , m ,C 1 ,C 2 |yi)gi" c ) 2 (y 1 |a;i !m ) (10) 



With the introduction of the average channel (O, and the 
use of two auxiliary parameters (p, A) € [0, l] 2 , we can follow 
the approach of to bound the conditional probability of 
decoding error P e ,x{x m ,Cx,C 2 \y l ). Taking expectation over 
the random choice of codebooks Cx and C 2 we obtain an 
average error probability: 



E 



x 1 



E p 
X 



(11) 



where we used Jensen's inequality to move the second expec- 
tation inside (-) p . 

Equation dTTb is hard to handle, mainly due to the corre- 
lation introduced by C 2 between the two factors inside the 
outer expectation. Furthermore, the evaluation of the inner 
expectations over Xi are complicated due to the powers pX 
and A affecting (y-^Xi). Bounding methods based on 
Jensen's inequality and ((H) fail to give good results due to the 
loss of exponential tightness. 

We proceed with a refined bounding technique based on 
the method of types inspired by [7|. While in this approach 
we still use ®, we use it to bound sums with a number of 
terms that only grows polynomially with n, and as a result, 
exponential tightness is preserved. 

Since the channel is memoryless, 

^ M 2 n 



i=X t=X 



Mn _ ^ 



Nx t , yi (Px^ 



X 1 X 2 Y 1 



5 



2 ^ [log ft^l |*1 ,X 2 )] 



(12) 



where we used Nx 1 .y 1 (P XiX2Yl ) to denote the number of 
codewords X2 in C2 such that (x\, X2,y\) have empirical 
distribution P^x Y • We a ^ so used ^xf-j(') to denote 
expectation with respect to the distribution P Xi x 2 y 1 - 

Replacing (fT2l in ( fTTT i and using (O three times, we obtain: 



El 



<- 



EE E 



• E p 



(P') 



. e n[p\Eplo e q 1 (Y 1 \X 1 ,X 2 )+\Ep,logq 1 (Y;\X' L ,X 2 ) ^ 



where we used P = P 
the expression. 

We next consider the bounding of 



and P' = Py, -o-'vi to shorten 

1 A 2 r l 



A 



E( yi ,P,P') = 



N 



(P) 



(14) 



and note that iVj^ ^ (P) and ATy i y (P') are formed by 
sums of an exponentially large number of indicator functions, 
each of which takes value 1 with exponentially small probabil- 
ity. These sums concentrate around their means, which show 
different behavior depending on how the number of terms 
in the sum (e nR2 ) compares to the probability of each of 
the indicator functions taking value 1 (depending on the case 
considered, these probabilities take the form e - nI { x 2-,Xi,Y 1 ) ^ 
e -nJ"(x 2 ;X 1 ,Y 1 ) > or e -n/(x 2 :Y 1 )^ Whenever one of the factors 
in (fl4] i concentrates around its mean it behaves as a constant, 
and hence is uncorrelated with the remaining factor. As a 
result, the correlation between the two factors of (TBI , which 
complicates the analysis, can be circumvented. We give the 
details of this part of the derivation in Appendix lAl but note 
here that the resulting bound on E(y 1 , P, P') depends on 
y 1 only through a factor l(y 1 £ P Yi ,P y ,;P Xi = P^, = 
Qi;P x — P x , = Q2). Therefore, the innermost sum in 
( fT3l > can be evaluated by counting the number of vectors 
Vi £ y± that have empirical types P y and P y ,. Note 
that this count can only be positive for P Yi = Py,. This 

count is approximately equal to e nH ( Y i) to first order in the 
exponent. Furthermore, the sums over P and P' in ( fT3l l have 
a number of terms that only grows polynomially with n. 
Therefore, to first order, the exponential growth rate of (fT~3T > 
equals the maximum exponential growth rate of the argument 
of the outer two sums, where the maximization is performed 
over the distributions P and P' which are rational, with 
denominator n. We can further upper bound the probability of 
error by enlarging the optimization region, maximizing over 
any probability distributions P, P'. 

V. Convex Optimization Issues 

In order to get a valid evaluation of Eri(Ri,R2,Qi, 
Q2, p, A), for any given Qi, Q 2 , p, A satisfying the constraints 
of the outer maximization, we need to accurately solve the 



inner minimization problems. A brute force search may not 
give accurate enough results in reasonable time. As will be 
shown below, the first minimization problem in © is a convex 
problem, and as a result, it that can be solved efficiently. 
In addition, convexity allows to lower bound the objective 
function by its supporting hyperplane, which in turn, allows 
to get a reliabl^ lower bound through the solution of a linear 
program. 

The second minimization problem is not convex due to the 
non-convex constraint R2 < /(XjjYi). If we remove this 
constraint, it will be later shown that we obtain a convex 
problem that can be solved efficiently. There are two possible 
situations: 

The first situation occurs when the optimal solution to the 
modified problem satisfies R2 < /(X2; Yi): in this case, the 
solution to the modified problem is also a solution to the 
original problem. 

The second situation is when the optimal solution to the 
modified problem satisfies R2 > I(X2]Yi): in this case, a 
solution to the original problem must satisfy R2 — /(X2; Yi). 
We prove this statement by contradiction. Let P* be the 
optimal solution to the modified problem, and P 2 * be an 
optimal solution to the original problem. Now assume con- 
versely, that there is no P 2 * that satisfies R2 — I(X2',Yi). 
With this assumption, we have that at P 2 *, R 2 < 7(X 2 ;Yi). 
Let V ^ {P = (P XiX2Yi ,P x , x , Y ,) : P Xi = P x , = 
Qx,P x = P x , — Q2}- Note that T> is a convex set 

and P*,P 2 * E P. Due to the continuity of I(X 2 ;%), the 
straight line in T> that joins Pj* and P 2 * must pass through 
an intermediate point P = aP* + (1 — a)P 2 , a £ (0,1), 
that satisfies I(X2\Y\) — Let /a(-) be the objective 
function of the second minimization problem in restricted 
to V. It will be shown later that /2(-)> restricted to this 
domain, is a convex function. By hypothesis, /2(P) > /2(P2* ) 
and we have / 2 (Pf) < /2CP2*) < MP)- On the other 
hand, from the convexity of /2(-)> restricted to T>, we have 
h(P) < «/ 2 (PD + (1 - a)/ 2 (P 2 *) < / 2 (P 2 *) and we get a 
contradiction. Therefore, it follows that there is a solution P 2 * 
to the original problem that satisfies i?2 = P(Xa; Yi). 

Let /i(-) be the objective function of the first minimization 
problem in (O. First, we note that P 2 * satisfies the constraints 
of the first minimization problem since they are less restrictive 
than the constraints of the second minimization problem in 
©. We next prove that /i(P 2 *) = /2(P 2 *)- As a result, the 
optimal solution P* of the first minimization problem satisfies 
/i(P*) < /i(P 2 ) = f2(P£)> and we do not need to know 
f 2(1*2 ) to evaluate the argument of the maximization in d3). 
Using the fact that at P 2 *, /(X 2 ;Yi) = I(X 2 ;Y{) = R 2 , we 
have: 

/ 2 (P 2 *)-/i(P 2 *) 

= P I(X[ ; X 2 , Y[) - pI(X[;Y{) - p(I(X 2 ; X[ , Y{) - R 2 ) 
J(X 2 ; X[,Yl) - I(X' 2 ;Y{) - I{X'^X[,Y{) + R 2 



= P 
= 0. 



(15) 



'in our implementation we solve the original convex optimization problem 
using the MATLAB function fmincon. 
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where we used the identity I(X[;X' 2 ,Y{) - I(X[;Y{) = 
I(X 2 ;X[,Y{) ~ I(X' 2 ; Y{) in the second equality. 

In summary, if the solution to the second minimization 
problem in ©, without the constraint on R 2 , satisfies R2 > 
I(X 2 ;Y\), then the first minimization problem in (01 dom- 
inates the expression. Otherwise, the solution to the second 
minimization problem in © without the constraint R2 < 
I(X.2\Yi), equals the solution to the second minimization 
problem with this constraint. 

It remains to show that the objective functions of 
the minimization problems in ©, /iff j j ^ , Px'x'y'^ 
f2 (Px 1 x 2 y 1 ' ^x' x'Y')> restricted to the domain T>, are convex 
functions. Since the sum of convex functions is convex, to 
prove the convexity of /i(-) on T>, we only need to prove that 
the different terms of 

f 1 =-^XE jtMi logq(Y 1 \X 1 ,X 2 )- 

pXE x , X , Y , log q(Y{\X[ ,X' 2 )- H(Y \X t )+ P I(X[ ■ Y{) 



max\l(X 2 ;X 1 ,Y 1 )-R 2 ; 

'p~\{I(X 2 ;X 1 ,Y 1 )-R 2 ) 
max i pI(X 2 ;Y{) + pI(X 2 ; X[, Y{) - R 2 



p(I(X 2 ; X[,Yl) ~ R 2 );pX(I(X' 2 ;Xi,Y() - R 2 ) 



(16) 



are convex within V. 

First, we have that — pXE XiX2Y \ogq{Yi\X\, X 2 ) — 
pXE x , x , Y , log q(Y{ \X[, X2) is linear in 

(P Xl x Y ' ^x'x'Y') an( ^ therefore convex. Also, we 
have that -H(Y 1 \X l ) = - H(X 1: Yi) is convex for 

fixed P Xi due to the concavity of H(X\,Y\), 

In addition, I(X[;Y{) can be written as D(P X , Y , \\P X , x 
P Y ,). Let P = XP + (1 - X)P for any P, P such that 
P x , = P x , and A S [0, 1]. We have that P X , Y , = \P X > Y ' + 
(l-A)A^, andP^ ; xPy, = P x ,x(\P t l+\l-X)P Yl ) = 
\(P x>i x P Y ,) + (1 - \)(P x>i x P Y ,). The convexity of 
pI{X[;Y{) for fixed P x , follows from the convexity of 
D{P\\Q) in the pair {P,q\. 



HX[;Y{) 



= D(P x ,y,\\P x ,xP Y ,) 

<\D{P X ,y,\\P X ,XPy,) 
+ {l-X)D{P X ,y,\\P X ,XPy,) 



\I(X[;Y{) 



+ (l-X)I(Xi;Yl) 



(17) 



Continuing with the next term of ( fT6l ), 

max {I{X 2 ; X u Yi) - R 2 ; p~\(I(X2 ; X u Y x ) - R 2 ) } 

we note that it is the maximum of two convex functions, 
and therefore convex. The convexity of each of the individual 



functions follows from the convexity of I(X 2 ; Xi,Y) for 
fixed Pa , Pa , which can be proved along the same lines 
as CF7j. 

Finally, we consider the last term of ( fTSI ): 

max {pI{X' 2 - Y{) + pI(X 2 ; X[,Y{) - R 2 ; 



p{I{X' 2 ; X[,Y{) ~ R2); pX(I(X' 2 ;X{,Yl) - i? 2 )| . 

Each of the arguments of the max{. . .} can be shown to be 
the sum of convex functions for fixed Pa, and Pa, , using 
a similar argument as the one used to prove ( fTTI ). Since the 
maximum of convex functions is convex, the convexity of /1 
restricted to V follows. 

Using similar arguments, it is easy to show that 
h = -~p-XE XiX2Yi log gi (*i|* ls * a )- 
pXE x , x , Y , log qi (Y(\X[X 2 ) - H{Y 1 \X 1 )+ 
pI(X[;X 2 ,Yl)+I{X 2 ;X 1 ,Y 1 )-R 2 
is convex in V. 



VI. Numerical Results 

In this section, we present a numerical example to show 
the performance of the error exponent region introduced in 
Theorem [T] We use as a baseline for comparison the error 
exponent region of Section Hill which is obtained with minor 
modifications from known results for single user and multiple 
access channels. 

We present results for the binary Z-channel model: Y\ = 
X 1 *X 2 ® Z, Y 2 = X 2 , where X x ,X 2 ,Yi,Y 2 E {0, 1}, Z - 
Bernoulli(p), * is multiplication, and © is modulo 2 addition. 
This is a modified version of the binary erasure IFC studied 
in iflOl . where we add noise Z to the received signal of user 
1. In the results presented here, we fix p — 0.01. 

The boundary of the error exponent region is a surface in 
four dimensions Ri, R 2) Eji t i, Er j2 . This surface can be ob- 
tained parametrically by computing Er^^Er^ as a function 
of Ri,R 2 , Qi,Q2, by optimizing over p and A in ([3]) and in 
the corresponding expression for Er^- The parameterization 
of Eji_i in terms of R\, R2, Qi, Q2, allows the study of the 
error performance as a function of the parameters that directly 
influence it. 

Fig. |2] shows that the error exponents under optimal decod- 
ing derived in this paper can be strictly better than the baseline 
error exponents of Section|III] This suggests that the inequality 
obtained in Appendix [B] for i?i — can be strict. In addition, 
in all the plots that we computed for the Z-channel for different 
values of Qi,Q2 and R2 we were not able to find a single 
case where the baseline exponent Eb,i was larger than Er^. 

We see that the curves of Er^ {Eb.i) for fixed R2, Qi, Q2 
have a linear part for Ri below a critical value R^' (R^ ), 
and a curvy part for i?i > r{^ (Ri > r[^}) (note that 
the critical values depend on the parameters R2, Qi and C^)- 
Figure [5] shows the optimal parameters p and A for the Er } i 
curves shown in Fig. [2] for R2 — 0.139 and R2 = 0.277 



7 




0.2 0.3 0.4 

R [nats/channel use] 

Fig. 2. Error exponents as a function of Ri for two different values of R2 
and fixed choices Qi, Qi. All the rates are in nats. 



p for R 2 =0.139, 0,(11=0.6, Q 2 (1)=0.9 
-* — Xfor R 2 =0.139, Q,(1)=0.6, Q 2 (1)=0.9 
— I — p for R 2 =0.277, 0,(11=0.6, Q 2 (1)=0.7 
- X for R =0.277, Q (1 )=0.6, Q (1 )=0.7 




0.2 0.3 0.4 

R, [nats/channel use] 

Fig. 3. Optimal parameters p and A for the Er^i curves of Fig. [2] All the 
rates are in nats. 

nats/channel use. We see that for the linear part of the Era 
curves p = 1 and A = 1/2 are optimal, while for the curvy 
part (i.e. R\ > -R^c ) the optimal p decreases to and 
the optimal A increases towards 1. For i?i in the interval 

(0, min{i4c >-^i<f }) tne 8 a P between the Era and Era 
curves remains constant as both curves are lines with slope 
— 1, and this gap is equal to the gap at R\ = 0. In general, any 
gap between Era and Eqa at R\ = will remain constant 
in the interval where both curves have slope —1. We also note 
since the optimal parameters p and A vary for different rates, 
these parameters are indeed active, i.e. they have influence on 
the resulting error exponent. 

The curves of Fig. [2] are obtained for fixed choices 
of Qi and Q2, which are the distributions used to 
generate the random fixed composition codebooks. 
As Qi and Q2 vary in the probability simplex S, 
we obtain the four-dimensional error exponent region 

{Ri, R2, Era(Ri, i?2, Qi, Q2), Er^(Ri,R2,Qi,Q2) 
Qi,Q2 € S}. In order to obtain a two-dimensional plot of 
the region, we consider a projection: we fix R2 varying R\ 
and plot the maximum value over Qi and Q2 in the error 



exponent region of mm{E^ } i, Efj^}- This corresponds to 
choosing Q\ and Q2 in order to maximize the error exponent 
simultaneously achievable for both users. Figure |4] shows this 
projection for R2 = 0.139 and R2 — 0.277 nats/channel use, 
where, for reference, we included the corresponding curves 
for the error exponents Eb,i,Eb,2 of Section HITl 



mln{E E ) for R 2 =0.139 [nats/channel use] - 




0.2 0.3 0.4 

R, [nats/channel use] 

Fig. 4. Maximum error exponent simultaneously achievable for both users 
for fixed R2 as a function of R±. 

For the noiseless binary channel of user 2, En.2 = 
max{_ff(<22) — -R2;0}, and as a result, Er^ decreases with 
increasing Pr(X 2 = 1) for Pr(X 2 = 1) > 1/2. On the 
other hand, because of the multiplication between X\ and X2 
in the received signal Y\, increasing Pr(AT2 = 1) results in 
less interference for user 1, and a larger value of Er^. It 
follows that there is a direct trade-off between Er i and Er^ 
through the choice of Q2, and whenever 
maximized, -Er,i = -Er.2- Therefore, in the curve of Fig. [4] 
Era = Er^- 

From the plots of Figs. [2] and |4] we see that the error 
exponents obtained from Theorem Q] sometimes outperform 
and are never worse than the baseline error exponents of 
Section Hn] 

Appendix A 
Proof of TheoremQ] 

It is easy to see that the optimum decoder for user 1 
picks the message m (1 < m < Mi) that maximizes 

a/M2)J2x 2 ec 2 1i l \yi\ x ^ x ^ where M x = \e nR ^ and 
M2 = [e nii2 ]. Applying Gallager's general upper bound to 

the "channel" Pfalxi) = J2x 2 ec 2 Qi^ (Vi\ x l, ^2), we 
have for user no. 1: 

1 PA 



Vi 



X 2 £C 2 



E {w 2 E ^vM^)) 

V' 1 ^X 1 \ X 2 <EC 2 I 



(A.1) 



where A > and p > are arbitrary parameters to be 
optimized in the sequel. Thus, the average error probability 



s 



is upper bounded by the expectation of the above w.r.t. the 
ensemble of codes of both users. Let us take the expectation 
w.r.t. the ensemble of user 1 first, and we denote this expec- 
tation operator by Ec 1 {-}- Since the codewords of user 1 are 
independent, the expectation of the summand in the sum above 
is given by the product of expectations, namely, the product 
of 

Jf E 9i T ' ) (l/il a! i> aj 2 



A±E C 



M: 



pA-1 



£ q[ n) (y 1 \x 1 ,x 2 ) y (A.2) 



and 




[W 2 E ??°(i/il*i>aa)) 

£ ( E ^(ytl^i,^)) 

Now, let -^VaJi.j/jf-Pjfjjfjjf^) denote the number of codewords 
{x 2 } that form a joint empirical PMF P% x together with 
a given X\ and y 1 . Then, using (0, A can be bounded by 



A =M. 



pA-1 



E 



E ^^(^ 



X 2 Y 1 . 



X 1 X 2 Y 1 



nE 



x 1 x 2 y 1 loggi(Yi|Xi,X 2 ) 



pA 



X 1 X 2 Y 1 



^npXE^z^ \ogqi{Yi\Xi,X 2 ) ^ y. 



where qx(Yi\X\, X 2 ) is the single-letter transition probability 
distribution of the IFC, and where Eg x ^ f{X\, X 2 , Yi), for 
a generic function /, denotes the expectation operator when 
the RV's (Xi, X 2l Yi) are understood to be distributed accord- 
ing to P Xl x 2 Yi Similarly, (and using Jensen's inequality to 
push the expectation w.r.t. C\ into the brackets), we have: 



B <M- pX M? 



E E X 1 N X u yS P ^x 2 Y 1 ) 



X 1 X 2 Y 1 

n\E 



x 1 x 2 Y 1 



;«(Yi|Xi,X 2 ) 



(A.4) 



Taking the product of these two expressions, applying <fSj to 
the summation in the bound for B, and taking expectations 
with respect to the codebook C2 yields 

E C2 {AB) < MfMa" 1 £ £ 



t ^X 1 X 2 Y 1 r X[x' 2 Yl 



The next step is to bound the term involving the expectation 
over C2. As noted, the codewords {Xi} and {^2} are 
randomly selected i.i.d. over the type classes T\ — Tq 1 and 
T2 = Tq 2 corresponding to probability distributions Qi and 
Q2, respectively. To avoid cumbersome notation, we denote 
hereafter P = Pv> <> v> and P 1 = Pv>, a,v>, and assume that 

X\X 2 Y\ AjA 2 Y 1 

P Xl = p x> = Qi. p x 2 = p x> = Q2, \ = P Y{ and that 
y x lies in the type class corresponding to Pg. . We will also 
use the shorthand notation 

E C2 4 E C2 [E x N£ i V {P)E? x N x Xi y {P>)]. (A.6) 

The bounding of Eq 2 requires considering multiple cases 
which depend on how R2 compares to different information 
quantities, and also depend on properties of the joint types 
P x x y 1 P x'x'y'- ^ n orc ler to guide the reader through 
the different steps we present in Fig. [5] below a schematic 
representation of the different cases that arise. 

We first consider two different ranges of R2, according to 
its comparison with I(X 2 ; X[,Y{): 

1. The range R 2 > I(X 2 ;X{,Y{). Here we have: 



Er 



j c 2 • 



,j x> 



N 



1-pA 



X 1 ,y 1 



(P) 



-Er„ { E 



X 1 



N 



i-p\ 
X 1>Vl 



(P) 



— V N± (P 1 

1711 ^ xi-y^ 

xeTi 



xeT x 



Er 



'X, 



N 



1-pA 

XuV, 



(P) 



W\X< N ^yS p,) 



xeTx 



3x ETi : N^y^P') > e n[iR2-I(X 2 ;X[^))+e] 



<Ec < E 



X 1 



N- 



pA 



Xuy, 



(P) 



,-n{H{Xi)-e) . 



n\{R 2 -I{X' 2 ;X[,Yl)+e) 



xeTx 



,nR 2 



Pr 



x exp{n[p\E XiX2ti log£?i(^i|Xi,^ 2 ) 



pXE 



Ai Xn Y-, 



logftO^I^,^)]} (A.5) 



3x e Ti : Nx y^P') > e n[{^-I{X' 2 ;X[,Yl))+e] 

<E C2 {E Xi [n^ X (P)] } • e -«Hff(^)-H(^ln')] x 

e 7ipA(JJ 2 -7(^ i X(,YS')) (A7) 

where in the second to last inequality we used Nx x ,y < M2, 
and in the last inequality we used the fact that 

Pr{3x e Tx : N^P') > e n[(ft-/(^;^,*k'))+«] J 
< e "(^) +e ' -Pr {iV^CP') > e »[(^-n^;^,i > x'))+e]} 

(A.8) 

for any i £ Ti, which decays doubly exponentially with n 
(cf. (El Appendix]). 
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To compute E C2 |% i N^ px (P) } we consider 
two cases, according to the comparison between R 2 and 

HX 2 -,X U Y X ): 

The case R 2 > /(X 2 ; Xi, Yi). Here, we have: 



E C2 E Xl 



N XuvS P \ 



E X X E C2 



N 1 -") (P) 



< E 



X x 



ldX^yjGTp^X 



_l_ e -nI(Xi;Y 1 ) e npX(R2-r(X 2 ;X 1 ,Y 1 )) 



(A.9) 



Therefore, when 

R 2 > max{/(X 2 ; X^Y), I{X' 2 ; X[,Y{)} 

we have: 

E C2 <exp{n [-I(X V , Y) + ^X(R 2 - I(X 2 ; X x , %)) 

- P I(X' 1 ;Y;)+p\(R 2 -I(X 2 ;X' 1 ,Y;))\ } . (A.10) 

The case R 2 < I(X 2 ;Xi,Yi). Here we have: 



E C2 E Xi 



< E C2 E Xi 



N X u yS P )\ 

< e -ivr(Xi;Yi) . e n(R 2 -I(X 2 ;X u Y 1 )) 

(A.ll) 

where we used the fact that pX < 1 and then estimated the 
expectation of Nj[ i y (P) as M2 times the probability x 2 
would fall into the corresponding conditional type. Therefore, 
when 

I{X' 2 -X' l ,Y{)<R 2 <I{X 2 ;X l ,Y l ) 

we have: 

E C2 < exp {n [-I(X V ,Y) + (P 2 - 7(X 2 ; X u Y)) 

- P I(X[-Yl) + P X(R 2 - I{XkX[,Yl)j\ } . (A. 12) 



The exponents for the subcases (|A. 10|) and (IA.12I) corre- 
sponding to R 2 > I(X 2 ;Xi,Yi) and R 2 <_ I(X 2 ; X x , Y), 
respectively, differ only in the factors (pX and 1, resp.) 
multiplying the term R 2 — I(X 2 ; Xi,Yi). Therefore, we can 
consolidate these two subscases of R 2 > I(X 2 ; X[,Y-[) into 
the expression: 

E C2 < expjn \-I{X 1 ;Yi)+ 

mm{p~X{R 2 -I(X 2 ;X 1 ,Y 1 )), 

(R 2 -I(X 2 ;X 1 ,Y 1 ))} 

-pI(X{;Y{) + pX(R 2 - I{X' 2 ,X[,Y{))\ } , (A. 13) 

since min{^A (R 2 - I{X 2 ; X lt Y)), {R 2 - I{X 2 ; X u Y))} 
is pX (R 2 - l{x 2 -X u Y)) when P 2 > I{X 2 ;X U Y) and 
(P 2 - I{X 2 -X U Y)) when R 2 < I(X 2 ; X^Y). 
2. The range R 2 < I(X^,X[,Y{). In this range, 



Ec 2 = E C2 
<E C2 



N XuyS P \ 
N x%S P \ 



E p 
^X, 

E p 
X, 



N 



X 



(A. 14) 



where we assumed A < 1 in the last step. The second 
expectation over Xi can be evaluated as 

E XN Xuyi (P^,y,) 

x 2 ec 2 

= e - nI (x^,rl) £ l(( X2 , yi )cT P ,, n ) 

X2GC2 

= e-^^^Ny^P^,), (A.15) 

where Ny i {P^iy') i s me number of codewords {x 2 } that are 
jointly typical with y 1 according to Px'y'- Thus, 

= e-^(xi;Xi,Y DEc2 [ E x N^ i y y^ i )N yi (P^,)\ 

= e-" pJ( ^^' tl ' ) ^x 1 ^ 2 [ivf i>yi (p^^^)^ i (p^ t ,)]. 

(A.16) 

To bound E Xi Ee 2 [N^ y (P)N y ^P')}, we consider two 
cases depending on how R 2 compares to /(JT^; 

The case R 2 > I{X' 2 \Y[), Here, we have: 
E X E cANf iy {P)N yi {P')] 

= E X E c^ iy {P)N^P')y. 

N yi (P') < e n ( R ^- I (x' 2 yi)+'-) 



E X E cAN^ y {P)N^{P> 



N yi (P')>e 



n(R 2 -I(X' 2 ;Y{)+e) 



< e np(R 2 -I{X 2 ;Y{)) Ex E 



C 2 



NP LvS p) 



e n(p\+p)R 2 p r 



Ny (P') > e n(R2-I(X 2 ;Yl)+e) 



< exp < n 



p(R 2 -I(X 2 ;Y{))-I(X 1 ;Y 1 ) 
+ 1(P 2 > I(X 2 ;X 1 ,Y 1 ))'p~X(R 2 - I(* 2 ; 
+ l(R 2 <I(X 2 ;X 1 ,Y 1 ))(R 2 -I(X 2 ;X 1 ,Y 1 )) j 

-expjn p{R 2 -I(X 2 ;Y{)) -I(Xr,Y) 
+ min{p~\(R 2 -I(X 2] X 1 ,Y 1 )), 
(R 2 -I(X 2 ;X 1 ,Y 1 ))}] \ 



(A.17) 



where we used the fact that Pr[N yi (P') > 
e n(i? 2 -/(-f 2 ; Y i)+ e )j decays doubly exponentially in the 
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third inequality, and bounded E x ^Ec 2 [Nj£ (P)] using 
(IA.9I ) and ( lA.lll i in the last inequality. 



The case P 2 < I(X 2 ; Y{), Here, we further split the evaluation 
into two parts. In the first part, P 2 > J(X 2 ; Xl, Yi), and we 
have: 

<E x EjN^ yi (P)N^(P')x 



N 



(P) < e n(«2-/(X 2 ;Xi,yi)+e) 



E Y Er jN 



X-pX 



{P)N yi {P')x 



N 



npX{B.2-I(X 2 ;X 1 ,Y 1 )). 



+ e n(p~\+p)R 2 p ] . Nx ^ (p) > e n(R 2 -I(X 2 ;Xi,Y 1 )+e) 
^ e n[TX{R.-I{X^XuY,))-I{Xv,Y 1 )] Ec2 [Ny^P^] 

< exp I n [p~\{R 2 - I{X2\Xx,Yx)) - I{Xx;Yx) 

+ p 2 -/(^ ; y 1 ')]J 

(A.18) 

where we used in the last inequality 

Ec 2 [N yi (P x ,y,)] <E C2 [N yi (P x ,y,)] = e «Cft-'C*^'» 
valid for p < 1. 

The other part corresponds to R 2 < I(X 2 ; Xx,Yx). Here 
we have: 

E x E C2 [N§ uyi (P)N yi (P')} 

+ £ Xi S C2 |iVj i yi (P)^ i (P')l[% 1 (P') > e~] 
<e""^ Xi S C2 {iVf i 2/i (P)l[iV yi (P') > 1] 



e n(pA+p)fl 2 p r 



iV yi (P')>e r 



<£ XiJ E C2 |iV^ i ^(P) • l[N Vi (P') > 1] x 

l[ N X u yS P )<e ne ] 
+ E x E C2 l [ N^ i yi (P)-l[Ny i (P') > l]x 

1 fe 1 ,y 1 (^)>^]} 



<e n ^E x E C2 ^l[N yi {P')>l}* 

^ Xl ,ySP)>A} 
+ e^E Xi l [ P I [N Xi yi (P) > 

Pr[iY yi (P')>l,Ar^ yi (P)>l] 



(A. 19) 

To bound Pr[JV yi (P') > (P) > l], we consider 

two cases: 



The first case is when P 



Pv>,o,: in this case, 



{A^ ^(P) > 1} ^ {jV y fP') > 1}. Therefore, 
Pr[iV yi (P') > l,N &i yi (P) > 1] =Pr[^ i ^ i (P) > 1] 

<e n(R 2 -I(X 2 :X 1 ,Y 1 ))^ 

Replacing in ( |A.19t , we get: 
E x E C2 [Nf iV {P)N yi {P>)] 

< exp {n[ ~ I(Xx; Yi) + R 2 - I(X 2 ; X lt Yi)] }. 

(A.20) 

The other case is P x Y ^ P x , Y r. in this case, the same 
codeword x 2 cannot simultaneously satisfy (ibx,x 2 ,y 1 ) G 
Tp and (x 2 ,y 1 ) G Tp Therefore, we have that 

-^1-^2*1 2 1 

Pr[A^(P')>l, AT^ yi (P)>l] 

=Pr[3x 2 ^ x 2 : {xx,x' 2 , yi ) G Tp^ ± ^ , 
{x2,yx)eT P ] 



^ E E Pr[(*i,a!a.l/i) G ^ 



2 ' 1 

<e n2i?.2 e -™/(Jf2;X 1 ,F 1 ) e -n/(X^F 1 ')^ 

Replacing in ( IA.19I ), we get: 

« JCi ^[jvf iiVi (^ i (p')] 

<exp{n[-/(X 1 ;y 1 ) + P 2 -7(X 2 ;i' l! y 1 ) 

+ p 2 -/(l^y/)]}. (A.21) 

This completes the decomposition of .Ec 2 mto me various 
subcases. 

Consolidation. Next, we carry out a consolidation process 
that merges all of the above subcases into a more compact 
expression, leading ultimately to the expression in Theorem Q] 
Figure [5] gives a schematic representation, in terms of a 
tree, of the various consolidation steps described below. The 
consolidation of ( lA.lOb and JA. 12t into (IA. 1 3b was done 
before, but we include it in Fig.|5]for completeness. Referring 
to Fig. [5] the consolidation starts at the deepest leaves of the 
tree and works its way up the nodes until it reaches the root. 

We begin with the last set of subsubcases derived, P 2 > 
I(X 2 ;Xx,Yx) and P 2 < I(X 2 ;Xx,Yx) (expressions (lATTSt . 
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(A.28) 



R,>ipS;x;, y;) 



r,<i(x; : x;, y;) 



(A.n) A 

R 2 >I(X 2 ;X„ \)y 



Ji 1 <I(X^X l , Y) 



R 2 » KX; Y\U 



(A. 10) (A. 12) 



(A. 17) 

R 2 > I(X; X r Y) 



26) (simplified from (A.24) (A.25)) 

R 2 <1(X' 2 ; Y[) 

(A.23) (simplified from (A.22)) 

R, < I(X 2 ; X t , %) 



+ l(P XzYi = PjtprMR* < I{X 2l X x ,Y))x 
[ - Yi) + R 2 - I(Xr,X u Yx)]]}^. (A.23) 

This is valid for the subcase P 2 < I(X 2 ; Y[). 

Next, we consolidate ( IA. 171 > from the subcase R 2 > 
I(X 2 ;Y{) with (1A.231 I and insert the result into ( IA.16b to get 



(A. 18) 



implicit in (A.22) 



P -=f= P 



<expL{-pI(X[;X^Y{) 



(A.21) (A.20) 

Fig. 5. Tree representing the multiple ranges of R2 considered in the 
derivation, and the equations that consolidate the different ranges. 



dA20t . and dA2TT i) for the subcase R 2 < I(X' 2 ;Y{), and 
consolidate them as follows: 

E Xi E C2 <exp|n{l(i? 2 > I(X 2 ; X u Y))x 

[p~X(R 2 - I(X 2 ; X x , Y1 )) - HXiiY) 

+ R 2 - I(X 2 ;Y()] 
+ l(R 2 <I(X 2] X 1 ,Y 1 ))l(P jtA ^P jt ,^x 
[ - I(Xx; Y) + R 2 - I(X 2 ; X U Y X ) 
+ R 2 - I(X 2 ;Y{)] 
+ l(R 2 < 7(l 2 ;l 1) y 1 ))l(P^ ti = P x , Y ,)x 



+ l(R 2 >I(X 2 ; Y{)) Y)+p{R 2 -I{X' 2 ; Y()) 

+ mm(p~X(R 2 -I(X 2 ; X x , Y )) , (R 2 -I(X 2 ; X x , Y)) } 
+l{R 2 <I{X' 2 -Y[))[l(P^P^ n )[- 

+ rmn(pX(R 2 -I{X 2 ; X x , Y ) ) , R 2 -I(X 2 ; Xt , Y ) } 
+ R 2 -I{X' 2 -Y[)} 
+ l{P xA = P*p/)HRa < HX^X^Y^x 

[ - ; Y) + R 2 - I(X 2 ; Xx, Y)] ] U , (A.24) 



which applies to the range R 2 < I(X 2 , X[,Y{). Again, 
expanding all terms against the indicators 1(P£ ^ P X , Y ,), 
and 1(P X = P X , Y ,), an d, as above, replacing indicators by 
min{- ■ ■ } as appropriate, we obtain 



[-I(X 1 ;Y)+R 2 -I(X 2 - 1 X 1 ,Y)}}^. Ec 2 <exp^n{l(P ±2Yi ^P^[-pI(X[;X 2 ,Y() 



(A.22) 

Next we would like to decompose the indicator 1(R 2 > 
I(X 2 ; Xi,Y)) appearing in the initial part of this expression 
as 

l(R 2 >I(X 2 ;X 1 ,Yi)) 

=l(i? 2 > I(X 2 -X u Y))l{P ±2Yi = P m )+ 

l(i? 2 > HX^X^YimiPx^ + P x> 2 Yl) 

=i(R 2 > /(x 2; x 1 ,y 1 ))i(p^ ti ± P m ), 

where we are taking into account in the last step that 
for the present subcase R 2 < I(X 2 ;Y{), l(R 2 > 
7(X 2 ;X 1 ,y 1 ))l(P^ 2 ^ i = Pxp,) = since for P^ = 

P X , Y , we have R 2 < I{X 2 ;Y{) = I{X 2 ;Y) < 

i(x 2 \x u Y). 

Applying this decomposition to (IA.221 I. then combining 
terms having the same indicators l{Pg ^ P X , Y ,), and 
l(P x Yl — Px'y')' an< ^ re pl acm g indicators by rnin{- • • } as 
appropriate (similar to (IA.13b ), we simplify (IA.22b to 

E X E C2 

<ex V !^n{l(P X2Yi ^P x , Y ,)[-I(Xr,Y)+ 

mm{p~X(R 2 -I(X 2 ; X x , Y)) , R 2 -I(X 2 ; X u Y) } 
+ R 2 -I(X 2 ;Y()] 



- I(Xi; Yi) + min{pA(P 2 - I{X 2 - X u Y x )), 
R 2 -I(X 2 ;X U Y)} 

+ min{p(P 2 - I(X 2 ;Y()),R 2 - I{X' 2 ; Y{)} 

1 (Px 2 Y 1 = P X 2 Y() X 

-pI(X[;X 2 ,Yl} + l(R 2 >I(X 2 ;Y))x 

[-I(X 1 ;Y 1 )+p(R 2 -I(X 2 ;Yl)) 
+ mm{p~\(R 2 - I(X 2 ; X^Y)), 

R 2 -I{X 2 -X 1 ,Y 1 )}]+l(R 2 <I{X 2 -Y 1 ))x 

[ - 7(li ; Y) + R 2 - I(X 2 ; X x , Y )] ] } j • (A.25) 

Using the identity (proved via the chain rule) 

I(X[ ;X' 2 ,Y{) + I(X 2 ; Y{) = I(X 2 ; X[, Y{) + I(X[ ; Y{) 
twice, we can rewrite the term 

- pI(X[;X' 2 , YD + mm{p(R 2 - I(X 2 ;Y{)), 

R 2 -I{X' 2 ;Y()} 
appearing after the indicator l(P X2Yi ^ Pji'Y') m <IA..251 > as 

- pI(X[-Y{) + min{p(P 2 - I{X' 2 -X[,Yl)), 

R 2 -pI(X 2 -,Yl)-pI(X 2 -X[,Y{)}. 

Similarly, we can decompose the term — pl(X' x \ X 2 , Y[ ) ap- 
pearing after the indicator l^P^^ = P x i Y *) against the indi- 
cators 1(P 2 > I(X 2 ;Y) and 1(P 2 < I(X 2 ; Y)), and use the 
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above identity to combine it with p(R 2 —I{X' 2 \ Y[)) appearing 
after the indicator l(R 2 > 7(X 2 ;Yx)). Incorporating these 
steps, we can rewrite (|A.251 > as 

E C2 < expjnll^^^,^)^/^;^)-/,/^;^) 

+ min{^(P 2 - 7(X 2 ; X u Yi)), R 2 - 7(X 2 ; X u Y t )} 
+ min{P 2 - pI(X 2 ;Yl) - pI(X 2 ; X[,Y{), 
p(R 2 -I{X' 2 -X[,Y{))} 

1(P 2 > 7(X 2 ; Yi)) [ - 7(X i; Yi) - P I(X[;Y{) 

+ rnin{^A(P 2 -7(X 2 ;Xx, Yi)), R2-I(X 2 ;X 1 , Y x )} 
+p(R 2 -I(X 2 ;X[,Y{))] 
H(P 2 <7(X 2 ;Yx))[-7(Xi;Yi) + P 2 

- 7(X 2 ; X x , Yi ) - pI(X[ ; X 2 , Y/)] ] } I . (A.26) 



Finally, we consolidate (I A. 1 31 > from the range R 2 > 
I(X 2 ;X[,Y{) with the just obtained ( lA.26b (for the range 
R 2 <I(X 2 ;X[,Y{))to get 



<exp jn{l(P 2 >7(X 2 ;X{,Yx'))x 
"-7(X 1 ;Y 1 )-p7(X{;Y/) 

+ min{^(P 2 -7(X 2 ; X 1; Yx)), (P 2 -7(X 2 ; Xx, Yi))} 
+p\{R 2 -I{X' 2 ;X' 1 ,Y{)) 



1(R 2 < I(X 2 -X[,Y{))[l(P X2ti ^P x , t{ )x 
-7(X 1 ;Y 1 )-p7(X(;Y 1 ') 

+ rrdn{M(^2-7(X 2 ; Xi, Yi)), E 2 -7(X 2 ;Xi, Yi)} 
+ min{P 2 - pI(X 2 ;Y{) - pI(X 2 ; X{, Y{), 

p(R 2 -I(X 2 ;X[,Y{))} 



l(i? 2 > 7(X 2 ; Yi)) [ - 7(X l5 Yi) - pI(X[;Y() 
+ mm{p~X(R 2 -7(X 2 ; X l5 Yi)), P 2 -7(X 2 ; Xx , Yi)} 
+p(R 2 -I(X 2 ;X[,Y{))] 
+ 1{R 2 <I(X 2 ;Y 1 ))[-I(X 1 ;Y 1 ) + R 2 

- 7(X 2 ; Xx , Yx) - p7(X{ ; X 2 , Y/)] ] ] } 1 . (A.27) 

As before, after expanding the first indicator 1(R 2 > 
I{X' 2 -X[,Y{)) against 1(7^ ± P x ,^), and lfp^ = 
P x iy')> an d combining terms, we obtain 

Sc 2 < exp|n{l(P^ i ^P^,)[-/(Xi;y 1 )-p/(^;^ L / ) 

+ min{M(P2-7(X 2 ;X 1 , Yi)), R 2 -I(X 2 ; X u Yi)} 
+ min{P 2 - pI(X 2 ;Y{) - pI(X 2 ; X[, Y(), 



p(R 2 - I{X 2] X[,Yl)),p\{R 2 -I{X 2] X[,Y{))} 

1 ( P X 2 Y 1 = P X' 2 Y{) X 

l(R 2 > 7(X 2 ; Yx)) [ - 7(X i; Yi) - P I{X[-Y{) 
+ min{p~X(R 2 -7(X 2 ; Xx , Yx)) , P 2 -7(X 2 ; lx, Yi)} 
+ min{p(i? 2 -7(X 2 ;X(,Y 1 ')),pA( J R 2 -7(X 2 ;Xi,Yi'))}] 
+ 1(P 2 <7(X 2 ;Yi))[-7(Xi;Yi) + P 2 

- 7(X 2 ; X x , Yi ) - pI(X[ ; X' 2 , Y/)] ] } j , (A.28) 

where, in simplifying, we have made use of the identity 

1(P 2 > I(X 2 ;X[,Y{))p\(R 2 -I(X 2 ;X[,Y{))+ 
1(R 2 < 7(X 2 ; X[,Y[)) mm{R 2 - pI(X 2 ;Y() 
- pI(X 2 ; X[ , Y/), p(R 2 - 7(X 2 ; X( , Y{))} 
= min{P 2 - pI{X' 2 -Yl) - pI(X 2 ;X{,Y{), 
p(R 2 - I{X' 2 ;X' x ,Yl)),p\(R 2 - I(X 2 ; X[,Y{))}, 
along with 

l{P xA = P x ,y,)l(R 2 > 7(X 2 ; X[,Y{)) 

= l(P^^ i =P^^0l(P 2 >7(X 2 ;Y 1 ))l(P 2 >7(X 2 ;X{,Y 1 ')), 
and finally 

l(i? 2 > 7(X 2 ; X{\ Y/))pA(P 2 -7(X 2 ;X;,Yx'))+ 
l(i? 2 < 7(X 2 ; X{, Yx'))p(P 2 - 7(X 2 ; X[, Y{)) 
= mm{p{R 2 -I(X 2 ; X[, Y/)), pX(R 2 —I(X 2 ; X[, Y/))}. 

We use (IA.28t in dA.51 l. add over all vectors y 1 , decompose 
all joint-type-dependent terms appearing in ( IA.51 . as well 
as the term nP(Yx) arising from the summation over y 1 
per type, against the indicators l(Px 2 Yi ^ ^x'Y' 



1(P 



X 2 Y L 



X X X 2 Y^ ' A ' x^y, 



and 

P^,y,), and finally optimize over the types 



75 



Ci,C 2 



(Pei ) < exp < n < — P 2 + pPi + max ■ 



max 

p . . . p - , - , - , 

p x 2 i> 1 ^ p x^v- 1 '. 



/ oA - E x 1 x 2 y ] logqi(Yx|Xx,X 2 ) 



+ MS^^lo ggi (Y/|X(,X 2 ) 

+ 77(Y 1 |X 1 )- /9 7(X(;Y 1 ') 

+ min{^A(P 2 -7(X 2 ; Xx, Yi)), R 2 -I(X 2 ; X u Y x )} 
+ min{P 2 - p7(X 2 ; Y/) - p7(X 2 ; X(, Y/), 

p(R 2 -I(X 2 ;X{, Y{)), pX(R 2 -I(X 2 ;X[,Y{))} 



p . . . p - , - , - , 

^=^=01. 
P X 2 = P X 2 =Q2, 

P x 2 i> 1 = p x 2 y- 1 ' 



P^EjtMi log?i(Yx|X 1; X 2 ) 
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+ p\E jt , m logqi(Y{\X' L ,X£ 

+ l{Ri > I[X*Y x ))[Hfr\X{) - pI(X[iYl) 
+ min{pA(i? 2 -I(X 2 ;X 1 ,Y 1 )),R 2 -I(X 2 ;X 1 ,Y 1 )} 
+ min{p(R 2 -I(X^Xi,Y;)), 

p\(R 2 -I(X 2 ;X[,Yl))}] 
+ l(i? 2 < I(X 2 ;Y 1 ))[H(Y 1 \X 1 )+R 2 

Ii^XuYt) - pI(X[-,X^Y()] 

' (A.29) 

Note that the term H{Y\) mentioned above has been combined 
with the term —I{Xx] X 2 ) appearing in all subcases of dA.28b 
to yield the H(Yi\Xi) appearing throughout ( |A.29b . 

The expression in Theorem Q] is obtained from ( |A.29t 
by dropping the constraint P_£ 2 j- 7^ ^x'Y'i f rom me m " st 
maximization (which, given the continuity of the underlying 
terms, is not really a constraint anyway), by noting that 
if, in the resulting expression, the second maximization is 
attained when R 2 > I(X 2 ;Yi), it will be dominated by 
the first maximization so that the second maximization can 
be restricted to the case R 2 < I(X 2 ;Y\), and finally by 
negating the resulting exponent (and propagating the negation 
as — max{- ■ • } = min{— • • ■ } throughout). 

Appendix B 
A Lower Bound to Er^ 

We can lower bound the maximization of @ over p and A 
by applying the min-max theorem twice, as follows. 
First we introduce a new parameter 8 and bound ® as 



Er 1 > min < R 2 — pR\ 

fe[Q,i] 



(B.l) 



mm 

( P # (1),>(1W1), 

eSi(Qi,Q 2 ) 



h (p, 



A,P*< 



,P,j 



/ 2 (p,A,P^ 



(2)^(2)-j>(2) , P-fi-'P) j c '{2)y'{2) 



v mm 

( P ^(2) *(2W 2 ) 

P ^'(2)^'(2) i> '(2)) 
A l A 2 r l 

(B.2) 

where 6 = 1 — 8 and we have dropped the constraint involving 
i?2 from S 2 , resulting in a lower bound, and making S 2 
convex. 

Letting 7 = pA, we claim that for fixed 6, the expres- 
sion in ( IB. 21 ) being minimized over 6* above is convex in 

(p, 7). This follows from the fact that for fixed P^w y-(i)-o(i) , 

x 1 ^2 * 1 

^i' (1) i' ll) 5 >,(1) ' P^(2)^(2)^>(2) , P^'(2)^'(2) ;i >'(2)), both fx 

1 2 1 1 2 1 12 1 

and f 2 are affine in (p, 7). The only problem would 
come from the max's appearing in these expressions, but 
it can be checked that these maximizations are indepen- 



dent of (p, 7) for fixed (P^ 



(1) v(i)V>(i) 



(1) y'Wv'W , 



l l -^2 J l •»M ^2 J l 

P^(2),X2),>(2), ^'(2)^(2)^(2)). Letting £ = {(x,y) : x G 



x (2) ± (2)^(2), 



[0, 1], y £ [0, x]}, we can thus apply the min-max theorem of 
convex analysis (twice) as follows 

Er,i 



> max min < R 2 — pi?i + x 

( P , 7 )Gsee[o,i] 



mm 

1 A 2 r l 

e<Si(Qi,Q 2 ) 



1 mm 

( P *(2) *(2W 2 ) 
1 A 2 y i 

P , 



h (p, 7, Py( 



(2)^(2)^. (2), P- (2) -(2) 



•'(*>) 



(2)^'(2) f .'(2)) 
1 A 2 r i 

GS 2 (Qi,Q 2 ) 



min max < i?2 — pR\ 

ee[o,i] 0,7)e2 



(•P. 

*1 A 2 y i 

e5i(Qi,Q 2 ) 



/1 (p> 7.^(1)^(1)^(0,^(1)^(0^(1)) + 



/a (p, 7, P*( 



(2)^(2)^,(2), P^ 



(2) -C-'(2) A'(2) 



( P ,>(2) *(2W2) 
1 A 2 y i 

P # '(2) # '(2) .'(2)) 
■*1 A 2 r l 

e5 a (Qi,Q 2 ) 



min max min < i?2 ■ 

ee[o,i](p,7)es(P (1) (1) ,p I 

A l A 2 y l A l -*2 1 *" 

eSi(Qi,Q 2 )xS 2 (Qi,Q 2 ) 

0/1 (A7.^wxp)y 1 w>- P i;p)x;wf;w) + 



pPi+ 



i, (2) if' f, (2) ' x w \ ,'-'v, 



'(2) y'(2)v>'(2) 



= min 
0e[o,i] (P. 



min max < R 2 — pRi + 

V ) ^ 1, ^ 1),P ^ (I3 *2 (1) *-i' cl),(p ' 7)6S I 

p x{ 2 >4 2 'f 1 ( 2 '- p x;( 2 )x 2 ( 2 )i- 1 '( 2 ) ) 



1 ^ 1 ""I "2 1 

(2) ^(2)^(2) < P ~'(2W'(2),-.'(2)) 

e5 1 (Q 1 ,Q 2 )x5 2 (Q 1: Q 2 ) 

2 (p,7,Py(2)^(2) fi (2),P^(2)^(2)-(2,) l 



(B.3) 



Since, as noted above, for fixed (8, P 



P.- 



(1) y'(l)v'(l) 1 



P^( 



P x '(2)^'(2)^' ( 2)), both /1 and 



/2 are affine in (p, 7), the inner maximization in ( IB. 31 ) is 
attained at one of the points (p, 7) = {(0, 0), (1, 0), (1, 1)}. 
After simplification, we obtain 



E R 1 > min min max 

ee[o,i] (^(1)^(1)^(1)^(1)^(1)^(1). 

p x( 2 )4 2 )v- 1 < 2 )'- p x;< 2 )x;( 2 )i> 1 '< 2 ) ) 

e5i(Qi,Q 2 )x5 2 (Qi,Q2) 
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I(X 



(1). y-(l) Y 
2 '^1 ! I l 



^ + \I(xWtfW)-R 2 \ + 



+ 



E 



K2)| v(2)n 



#1 + 1 



- £ 



I(X'^;Y^)+ 



\I(X.. 



- E 



H(y 1 (2) |x 1 (2) )+ 



/(^ 2 );^ 2 ) ) F 1 '( 2 ))+7(xf;Xf),F/ 2 )) 



- £ 

Kl)|yW 



'(i).v'(i)^ 



H(Y^'\X^') + I(X 1 W ;Y 1 W )+ 
IiX^-X^,Y^) + \I(X^,X^^)-R^ 



+ 



- E 



log qi (1>' (2) |i; (2) , X™ )] - H(Y} 2) \x[ 2) )+ 



i{x^^T\y[ (2) )+i{xf ) -,xf\Y^) 



+ I{X™ ; X 2 (2) ) + J(x{ 2) ; X {2) , yf } ) 



(B.4) 



where in simplifying the third expression in the maximization 
we have also exploited the constraints H(Y^) = H(Y 1 ^) 
and H{Yl 2) \X {2) ) = H(Y^ 2) \X 2 {2) ). 

For Ri = we can further simplify this expression. In 
particular, for Rx = 0, the first term in the inner maximization 
is readily seen to be always smaller than the second term. 
Additionally, the second and third terms are symmetric in 
the primed and non-primed joint distributions, which, together 
with the readily established joint convexity of the maximum 
of these two terms on the constraint set, imply that the inner 
minimization over the joint types is achieved when the primed 
and non-primed joint distributions are equal, in which case the 
two terms are equal. Therefore, at R\ = we have 

E* R i > min min 

#e[0,l] (P-(l) -(1) (2)^(2)^(2)): 
X l X 2 Y l X l X 2 Y l 

P^l) =P jt (2) =QuP^l) =P jt (2 > =Q2 



e\D^+I(x[ 1) ;X^)+ 



\I(X^;X?\Y^)-R 2 \ + 



9 + I{X (2) ;X {2) ) + I{X (2) -X [2 \y± 2) ) (B.5) 



(2). y(2) V>(2)s 



or 



Next, we note the identities 

I{X 2 -X 1 Si)=I{XuX 2 ) + H(Y 1 \X 1 )-H(Y 1 \X 1 ,X 2 ) 
I{X 1 -X 2 Si) = I{XuX 2 ) + H(Y 1 \X 2 )-H(Y 1 \X 1 ,X 2 ) 



E 



X 1 X 2 Y 1 



log gi [Y^XuX, 



and use them, with the shorthand £)("') = 

and £)'("*) 

to rewrite the bound as 



D(P^ m) ]x ^) x ( m) I \li\P x [™) x ^) ) 



E* R l > min min max < 

P X< 2 » x( 2 )i>( 2 )' P v-'< 2 > v-'(2)v'(2)) 
'l x 2 z l A l A 2 Y l 

eS 1 (Q u Q 2 )xS 2 (QuQ2) 

D^+IiX^a^) + \I(x' 2 W ;Y; W ) R 2 \+ 

d^+kx^-xH- 



+ 



-Ri+6 



D^+I(X^-X^)+I(X[ W -,Y^ 



\I(X^;X[ {1) ,Y^) 

-(2). y(2) 



i?2l + 



+ I(X (2) -X {2) ) + I(XI 2) -X^ 2) X {2) ) 



Ri 



D'W+I(X^;X' 2 ^) + I(X^-,Y^)+ 



lliX^-X^^-R^ + 



E* R 1 > min 



D + I{X 1 ;X 2 ) + I(X 1 ;Y 1 ) 



+ \I{X 2 ;X l ,Y l )-R 2 \+ ; 



min D + I(X 1 ;X 2 )+I{X 1 ;X 2 ,Y 1 ) 



1 X 1 X 2 Y 1 - 

Px 1 =QuPx 2 =Q2 



(B.6) 



wh ere D = D(P filXiX2 \\ qi \P XiX2 ). 
Simplifying Eb,i at R\ = gives 



-^B.l = max • 



X 1 X 2 Y 1 ■ 

p Xi =Qi,Px 2 =Q2 



L» + 7(X 1 ;X 2 )+7(je i ;F 1 ) 



mm 



D + J(Xi;X 2 )+ 
f \I(X 1 ;Y 1 )+I(X 2 ;X 1 ,Yi)-R2\ + 



r X 1 X 2 Y 1 - 

P Xl =Qu p x 2 =Q2 



D + /(X 1 ;X 2 ) + /(X 1 ;X 2 ,Y 1 ) 



(B.7) 

which is seen to be no bigger than the above lower bound on 
E* R1 , since \I{X 2 ; X u Yi) - i? 2 | + > 0, /(Xi;^,^) > 
/(iijFi), and 7(X i; Yi) + |J(X 2 ; - i? 2 |+ > 

|/(X 1 ;f 1 )+/(X 2 ;X 1 ,F 1 )-ii 2 |+. 
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Another application of the lower bound dB.4| > is in deter- 
mining the set of rate pairs Ri,R 2 for which E^ , > 0. 
Let (Xl,.X_) be independent with marginal distributions Q\ 
and Q2 and Y\ be the result of (Xl,A^) passing through 
the channel q\. We shall argue that if R\ < I(Xv,Yi) + 
\I(X2;X 1} Yi) -R*\+= I(Xij Yi) +\I{X 2 ;Y 1 \Xi) - R 2 \ + . 
and Ri < I(X 1 - 1 X 2 ,Y 1 ) = I(X 1 - 1 Y 1 \X 2 ) then the expression 
( 1B.41 i must be greater than 0. Indeed, for the expression 
( IB. 4b to equal 0, we see from the first term in the inner 
maximum that the minimizing 8 and joint distributions must 
satisfy one of the following: case I: 6 — 1, Z)W = 0, 
and /(i^X^) = 0; case 2: 9 = 0, L> {2) = 0, and 
I(X[ 2) ; XP) = 0; or case 3: < 6 < 1, = = 0, 
and I(Xp;XP) = I(x{ 2) ;X^) = 0. If case 1 holds 
then (x[ 1 \X^\Yi) necessarily have the same joint dis- 
tribution as (Xi, X2,Y\), in which case, we see from the 
third term in the maximum in ( IB. 4b that i?i > I(X-\_\Y\) + 
|7(X 2 ;Xi,Yi) - R 2 \ + . Similarly, if case 2 holds then it 

- {2) " (2) * (2) 

follows that (X{ , X2 ,Yi ) have the same joint distribution 
as (Xi, X2, Yi), in which case, it follows again from the third 
term in the maximum that Ri > I(Xi; X2, Yi). Finally, if case 
3 holds then both {x[ 1] , x£\ Y^) and (x[ 2) , X 2 (2) , ) 
have the same distribution as (Xi, X2, Yi), in which case, 
after writing Ri = 9Ri + 6R\, we see again that either Ri > 
7(X i; Yi) + ^{X^Xx^Yx) - R 2 \+ or R x > I(X i; X2,Yi) 
must hold. Thus, the three cases together establish the above 
claim that if i?i < I(Xi;Yi) + |J(i" 2 ; Yi|Xi) - R 2 \+ and 
Ri < I(Xi;Yi\X2) then the expression ( IB. 4b . and hence 
x , must be greater than 0. It can be checked that this region 
is equivalent to 

{Rx <I(X 1] Y 1 )}u{{R 1 + R 2 <I{Y 1 ;X 1 ,X 2 )} 

n{Rx <7(x i; y 1 |l 2 )} 

which is represented in Fig.[T|in Section|lV] It is shown in ifTTI 
that for the ensemble of constant composition codes comprised 
of i.i.d. codewords uniformly distributed over the types Q\ 
and Q 2 , the exponential decay rate of the average probability 
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